Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: Alex.brollo Date: Mar 28, 2014 4:01am
Forum: forums Subject: Question about djvu.xml and abbyy.xml

While thinking about a djvu text editor, I'm very interested about the flow of data wrapped into djvu.xml and abbyy.xml files.

1. I imagine that the FineReader OCR step produces both the abbyy.xml file and the djvu file - with its hidden, mapped text. Is it true?

2. I imagine too that djvu.xml is produced extracting it from djvu file with DjvuLibre djvutoxml script or a similar one; t.i. that it is not produced from abbyy.xml. Is it true?

3. Djvu text layer editing can be don editing djvu.xml text, then re.loading hidden text into djvu file with DjvuLibre djvuxmlparser routine or a similar one. Is there any project/routine/idea to do this?

4. years ago, I met Internet Archive while reading about reCaptcha. Is recaptcha running now? And if it is running, how the words fixed by rechaptcha are used?

I imagine that my questions could find their answers here or there into IA doc/forums/faqs but I'd appreciate some help to find them :-)